Fix ARG_NOT_SET deserialization for _date fields in OperatorSerialization#66564
Fix ARG_NOT_SET deserialization for _date fields in OperatorSerialization#66564hwang-cadent wants to merge 2 commits intoapache:mainfrom
Conversation
7005b5f to
e1d1f5a
Compare
``OperatorSerialization._deserialize_field_value`` previously sent every
``*_date`` field to ``_deserialize_datetime`` unconditionally. When an
operator stores a date-suffixed field as ``NOTSET`` (an ``ArgNotSet``
sentinel meaning "use the default at runtime") the value is encoded as
``{__type: ARG_NOT_SET}``. Passing that encoded value to
``_deserialize_datetime`` fails because it is not a datetime payload, so
deserializing such an operator (e.g. ``TriggerDagRunOperator`` whose
``logical_date`` defaults to ``NOTSET``) raised an error.
This change short-circuits ARG_NOT_SET on date fields and restores the
``NOTSET`` singleton, leaving ``None`` and real datetimes unchanged.
Adds a direct unit test in ``test_serialized_objects.py`` covering
``logical_date``, ``start_date`` and ``end_date``.
e1d1f5a to
f2b465e
Compare
potiuk
left a comment
There was a problem hiding this comment.
Overview
Small, focused fix in OperatorSerialization._deserialize_field_value. When a _date-suffixed field round-trips through serialization with the value {Encoding.TYPE: DAT.ARG_NOT_SET}, the code now returns the NOTSET sentinel instead of feeding the dict to _deserialize_datetime (which fails). One unit test verifies the fix for logical_date / start_date / end_date. Real-world repro is TriggerDagRunOperator.logical_date=NOTSET.
Code correctness — questions
1. Why scope the fix to _date fields only?
The existing _deserialize_field_value branches are: resources → from_dict, *_date → _deserialize_datetime, otherwise → return-as-is. The fallback branch would return {Encoding.TYPE: DAT.ARG_NOT_SET} as the literal dict, not as NOTSET. So the same class of bug exists for every non-date field that an operator stores as NOTSET. If TriggerDagRunOperator has only a date-typed NOTSET-able field today, the scope is technically OK — but the fix is fragile (next operator using NOTSET on a string field would silently get a dict back).
A safer shape: hoist the ARG_NOT_SET check above the field-type dispatch.
if isinstance(value, dict) and value.get(Encoding.TYPE) == DAT.ARG_NOT_SET:
from airflow.serialization.definitions.notset import NOTSET
return NOTSET
# ...existing field-name dispatchThat makes ARG_NOT_SET symmetric with how it's serialized (a generic encoding), regardless of field type.
2. The symmetry assumption isn't tested
The test only proves the deserialize side handles the encoding. There's no test that actually serializes TriggerDagRunOperator(logical_date=NOTSET) and confirms the encoding emitted by the serializer is the one this branch matches. A round-trip test (serialize → deserialize → identity) would catch the case where the serializer changes its NOTSET encoding shape (e.g. wrapping in {Encoding.VAR: ..., Encoding.TYPE: ...} instead of the bare {TYPE: ARG_NOT_SET} form the test mocks).
3. endswith(\"_date\") is broad
Matches logical_date, start_date, end_date, plus any future operator field a contributor names with that suffix that isn't actually a datetime (e.g. update_date_str, last_seen_date_iso). Not a regression — just inheriting the existing behavior — but worth noting if you do the hoist above (the check then becomes type-agnostic and you don't widen this surface further).
Test coverage
- Unit test covers the three current fields. Good.
- Missing: round-trip test (operator instance →
SerializedDAG.serialize_dag→SerializedDAG.deserialize_dag→assert .logical_date is NOTSET). That's the test that would have caught the bug at PR-time and would catch the symmetry break. - Missing: parametrize across the three fields rather than a
forloop so a failure in one shows the failing case clearly in pytest output.
Style / conventions
- Inline
from airflow.serialization.definitions.notset import NOTSETinside the method — consistent with how Airflow handles potential circular imports in the serialization module. Fine. - Comment is well-written and names the concrete failure case (TriggerDagRunOperator). Good.
- Docstring on the test explains the scenario clearly. Good.
Other
- No newsfragment. This fixes a user-visible bug (operator with
logical_date=NOTSETwould previously fail to deserialize) — likely warrants a*.bugfix.rstnewsfragment. - No mention of when the regression landed in the PR description. Helpful for downstream maintainers deciding whether to backport to
v3-2-test/etc.
Verdict
The fix is correct for the demonstrated case and the test pins the behavior. Two recommendations before approving:
- Hoist the ARG_NOT_SET check above the field-name dispatch (~3 lines moved) so any future field type is covered, not just
_date. - Add a round-trip test through
SerializedDAGso the serialize/deserialize symmetry is the thing under test (not just the deserialize half).
Add a newsfragment in either case. Otherwise LGTM as a tightly-scoped bug fix.
potiuk
left a comment
There was a problem hiding this comment.
Requesting changes until responses are given.
Summary
OperatorSerialization._deserialize_field_valuepreviously routed every*_datefield directly to_deserialize_datetime. When an operator stores a date-suffixed field asNOTSET(anArgNotSetsentinel meaning "use the default at runtime"), the value is encoded as{__type: ARG_NOT_SET}. Passing that encoding to_deserialize_datetimeraises because it is not a datetime payload, so deserializing such an operator (for exampleTriggerDagRunOperatorwhoselogical_datedefaults toNOTSET) fails.This PR short-circuits the ARG_NOT_SET case on date-suffixed fields and restores the
NOTSETsingleton.Nonevalues and real datetimes keep their existing behavior.Why a separate PR
This was originally bundled inside #56973 (dynamic
dag_idresolution inTriggerDagRunOperatorlinks). Per reviewer feedback from @potiuk, splitting it out keeps each concern independently reviewable, bisectable and backportable. #56973 is being rebased to drop the serialization change once this PR lands.Changes
airflow-core/src/airflow/serialization/serialized_objects.py: handle ARG_NOT_SET in_deserialize_field_valuefor date-suffixed fields.airflow-core/tests/unit/serialization/test_serialized_objects.py: addtest_deserialize_field_value_with_arg_not_set_for_date_fieldscoveringlogical_date,start_date,end_date.Type of change
Testing
_deserialize_field_valuereturnsNOTSETforlogical_date,start_dateandend_datewhen given an ARG_NOT_SET encoding.TriggerDagRunOperatorround-trip tests on Fix dynamic dag_id resolution inTriggerDagRunOperatorlinks #56973.